Practical Fast Searching in Strings
نویسنده
چکیده
The problem is that of searching a large block of text to find the first occurrence of a substring (which we will call the ‘pattern’). This particular operation is provided in most text editing systems and it also has applications in bibliographic retrieval systems. Since the text to be searched can be overwhelmingly large — perhaps hundreds of thousands of characters — it is important to use efficient techniques. Simple programs for searching text typically require a worst-case running time of O(mn), where m is the length of the pattern and n is the length of the text. However, Knuth et al.5 showed that this time can be reduced to O(n) with a fairly complicated algorithm. Later, Boyer and Moore published a practical and simpler algorithm1 that also has this linear worst case running time. In the average case, only a small fraction of the n characters are actually inspected. (A recent paper by Galil3 reports some improvements to the Boyer and Moore algorithm for its worst case behaviour.) Many programmers may not believe that the Boyer and Moore algorithm (if they have heard of it) is a truly practical approach. It is the purpose of this paper to demonstrate that it is and to show the circumstances under which it should be employed. Many computers, particularly. the larger machines, possess instructions to search for individual characters within main memory. One might think that these instructions would permit codings of routines that could beat the Boyer and Moore algorithm. However, we will experimentally show that this is not always the case — even when the search instructions are used in the most efficient manner imaginable.
منابع مشابه
Approximate Search Engine Optimization for Directory Service
Today, in many practical E-Commerce systems, the real stored data usually are short strings, such as names, addresses, or other information. Searching data within these short strings is not the same as searching within longer strings. General search engines try their best to scan all long strings (or articles) quickly, and find out the places that match the search conditions. Some great online ...
متن کاملA Fast and Accurate Global Maximum Power Point Tracking Method for Solar Strings under Partial Shading Conditions
This paper presents a model-based approach for the global maximum power point (GMPP) tracking of solar strings under partial shading conditions. In the proposed method, the GMPP voltage is estimated without any need to solve numerically the implicit and nonlinear equations of the photovoltaic (PV) string model. In contrast to the existing methods in which first the locations of all the local pe...
متن کاملFast Relative Lempel-Ziv Self-index for Similar Sequences
Recent advances in biotechnology and web technology are generating huge collections of similar strings. People now face the problem of storing them compactly while supporting fast pattern searching. One compression scheme called relative Lempel-Ziv compression uses textual substitutions from a reference text as follows: Given a (large) set S of strings, represent each string in S as a concatena...
متن کاملString Matching in the DNA Alphabet
Searching for occurrences of string patterns is a common problem in many applications. Various good solutions have been presented for string matching. The most efficient solutions in practice are based on the Boyer–Moore algorithm.1 A typical question in molecular biology is whether a given sequence has appeared elsewhere. In the following, we will concentrate on searching for exact occurrences...
متن کاملThreshold Approximate Matching in Grammar-Compressed Strings
A grammar-compressed (GC) string is a string generated by a context-free grammar. This compression model captures many practical applications, and includes LZ78 and LZW compression as a special case. We give an efficient algorithm for threshold approximate matching on a GC-text against a plain pattern. Our algorithm improves on existing algorithms whenever the pattern is sufficiently long. The ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Softw., Pract. Exper.
دوره 10 شماره
صفحات -
تاریخ انتشار 1980